Raw benchmark output for conversion timings (ERR3 17482) 



Based on http://www.ebi.ac.uk/ena/data/view/ERR317482 . This is a submitted BAM file produced 
directly from alignment with BWA. The organism is Human. 

The WRITE timings in the paper were derived by substracting READ timings from READ+ WRITE 
operations. The raw results from the benchmark are listed below. These include some SAM reading 
and writing benchmarks not presented in the main paper. Note that not all tools have an easy way to 
benchmark reading without writing. For Scramble/Samtools we just used flagstats as the CPU 
overhead of that is minimal. Picard's bam reading timing comes from bam indexing, which is also 
dominating by decoding speeds. For Cramtools we edited the code to remove writing and did a file 
conversion. 



=== 40-qual: 9827_2#49 === 



Scramble 


READ BAM 


R 


76. 


,31 


C 


74, 


,60 


S 


1, 


,56 


Scramble 


BAM2CRAM 


R 


427. 


,26 


C 


386, 


,16 


S 


15, 


,16 


Scramble 


READ CRAM 


R 


130. 


,17 


C 


100, 


,74 


S 


16, 


,37 


Scramble 


BAM2BAM 


R 


758, 


,21 


C 


741, 


,12 


S 


15, 


,15 


Scramble 


BAM2SAM 


R 


115. 


,94 


C 


101, 


,01 


S 


14, 


,64 


Scramble 


SAM2SAM 


R 


179, 


,95 


C 


72, 


,64 


S 


21, 


,52 


Scramble 


READ SAM 


R 


162, 


,84 


C 


52, 


.27 


S 


7. 


,90 


Samtools 


READ BAM 


R 


85, 


,23 


C 


76, 


,38 


S 


1. 


,74 


Samtools 


BAM2BAM 


R 


752, 


,42 


C 


733, 


,11 


S 


17, 


.64 


Samtools 


BAM2SAM 


R 


173, 


,33 


C 


148, 


,52 


S 


23, 


,10 


Samtools 


SAM2SAM 


R 


213, 


,25 


C 


179, 


,36 


S 


29. 


.67 


Cramtools 


READ BAM 


R 


115, 


,14 


C 


115. 


,16 


S 


2, 


,00 


Cramtools 


BAM2CRAM 


R 


1374, 


,69 


C 


1485, 


,92 


S 


16, 


,39 


Cramtools 


READ CRAM 


R 


199, 


,21 


C 


207, 


,30 


S 


4, 


,84 


Picard 


READ BAM 


R 


118, 


,98 


C 


120, 


,57 


S 


3, 


.21 


Picard 


BAM2BAM 


R 


653, 


,46 


C 


646, 


,14 


S 


10, 


,39 


Picard 


BAM2SAM 


R 


411, 


,16 


C 


402, 


,46 


S 


17. 


,81 


Picard 


SAM2SAM 


R 


499, 


,16 


C 


457, 


,50 


S 


28, 


,87 



=== 8-qual: 9827_2#49 === 



Scramble 


READ BAM 


R 


63, 


.26 


C 


62, 


,09 


S 


1, 


.03 


Scramble 


BAM2SAM 


R 


109, 


,54 


C 


90, 


,29 


S 


15, 


.25 


Scramble 


READ SAM 


R 


83, 


,03 


C 


41, 


,08 


S 


6, 


.93 


Scramble 


SAM2SAM 


R 


296, 


,81 


C 


88, 


,51 


S 


25, 


.64 


Scramble 


BAM2CRAM 


R 


390, 


,73 


C 


352, 


,28 


S 


13, 


.57 


Scramble 


READ CRAM 


R 


130, 


,26 


C 


98, 


,96 


S 


16, 


.28 


Scramble 


BAM2BAM 


R 


904, 


,58 


C 


889, 


,69 


S 


12, 


.78 


Samtools 


BAM2SAM 


R 


160, 


,93 


C 


137, 


,47 


S 


20, 


.93 


Samtools 


SAM2SAM 


R 


210, 


,90 


C 


180, 


,70 


S 


27, 


.89 


Samtools 


BAM2BAM 


R 


900, 


,16 


C 


882, 


,63 


S 


15, 


.98 


Samtools 


READ BAM 


R 


71, 


.74 


C 


66, 


,01 


S 


1 


.01 


Cramtools 


READ BAM 


R 


98, 


.41 


C 


98, 


,20 


S 


1, 


.89 


Cramtools 


BAM2CRAM 


R 


1345, 


,31 


C 


1450, 


,87 


S 


14, 


.30 


Cramtools 


READ CRAM 


R 


184, 


,52 


C 


191, 


,79 


S 


3, 


.91 


Picard 


READ BAM 


R 


104, 


,90 


C 


107, 


,88 


S 


2 


.70 


Picard 


BAM2BAM 


R 


575, 


,93 


C 


570, 


,71 


S 


8 


.52 


Picard 


BAM2SAM 


R 


395, 


,49 


C 


387, 


,99 


S 


16, 


.89 


Picard 


SAM2SAM 


R 


498, 


,05 


C 


454, 


,58 


S 


28, 


.73 



File sizes for ERR317482 



- rw- 


r- 


- r- - 


1 


jkb 


teamll7 


3782435322 


Jan 


13 


18: 


:51 


9827 


2#49, 


, j ava . cram 


- rw- 


r- 


- r- - 


1 


jkb 


teamll7 


6517466114 


Jan 


13 


19: 


:07 


9827 


"2#49 , 


, picard . bam 


- rw- 


r- 


- r- - 


1 


jkb 


teamll7 


21058600749 


Jan 


13 


17: 


:46 


9827 


"2#49 . 


, sam 


- rw- 


r- 


- r- - 


1 


jkb 


teamll7 


6499959405 


Jan 


13 


18: 


:24 


9827 


"2#49 . 


, samtools . bam 


- rw- 


r- 


- r- - 


1 


jkb 


teamll7 


6499339414 


Jan 


13 


18: 


:12 


9827 


"2#49 . 


scramble. bam 


- rw- 


r- 


- r- - 


1 


jkb 


teamll7 


3509991876 


Jan 


13 


17: 


:57 


9827 


|2#49 . 


, scramble. cram 


- rw- 


r- 


- r- - 


1 


jkb 


teamll7 


2326256169 


Jan 


13 


20: 


:44 


9827 


2#49. 


, bin . j ava . cram 


- rw- 


r- 


- r- - 


1 


jkb 


teamll7 


4896591365 


Jan 


13 


20: 


:59 


9827 


"2#49 . 


, bin . picard . bam 


- rw- 


r- 


- r- - 


1 


jkb 


teamll7 


21058600903 


Jan 


13 


19: 


:37 


9827 


"2#49 . 


, bin . sam 


- rw- 


r- 


- r- - 


1 


jkb 


teamll7 


4804360490 


Jan 


13 


20: 


:19 


9827 


"2#49 . 


bin . samtools . bam 


- rw- 


r- 


- r- - 


1 


jkb 


teamll7 


4803961469 


Jan 


13 


20: 


:04 


9827 


"2#49 , 


, bin . scramble . bam 


- rw- 


r- 


- r- - 


1 


jkb 


teamll7 


2159312316 


Jan 


13 


19: 


:47 


9827 


"2#49 , 


, bin . scramble .cram 



Size breakdowns for ERR317482 

Q40: cram_dump 9827_2#49. scramble. cram | tail 

Block 0, total size 228674993 CORE 

Block 1, total size 488233 Seqs 

Block 2, total size 2794452939 Qual (0.495) 

Block 3, total size 288012891 Names 

Block 4, total size 7486916 TN/NP 

Block 5, total size 175286839 Tags 

Block 6, total size 5596722 Soft-clips 

Q8: cram dump 9827_2#49.bin.scramble.cram | tail 

Block 0, total size 228674993 CORE 

Block 1, total size 488233 Seqs 

Block 2, total size 1443773067 Qual (0.256) 

Block 3, total size 288012891 Names 

Block 4, total size 7486916 TN/NP 

Block 5, total size 175286839 Tags 

Block 6, total size 5596722 Soft-clips 



Fastq compression metrics 



Quality encoding tested with fqzcomp. This uses probabilistic modelling and arithmetic coding. It 
is not realistic to expect CRAM to compete given CRAM requires random access and compresses 
each 10000 reads independently, however it gives a useful baseline to compare against. 



$ fqzcomp < 9827_2#49. fastq > /dev/null 



IMdllltrb 


1891639400 


-> 


165489689 


(0. 


,087) 


Bases 


5646323600 


-> 


1213484148 


(0. 


,215) 


Quals 


5646323600 


-> 


2456779729 


(0. 


,435) 


$ fqz_ 


comp < 9827 


_2#49. bin. fastq > /dev/null 


Names 


"1891639400 


-> 


165489689 


(0. 


,087) 


Bases 


5646323600 


-> 


1213484148 


(0. 


,215) 


Quals 


5646323600 


-> 


1097053253 


(0. 


,194) 


$ fqz_ 


comp -n2 -s6+ 


-q3 < 9827 


2#49. fastq > /dev/null 


Names 


1891639400 


-> 


165489689" 


(0. 


,087) 


Bases 


5646323600 


-> 


1171170282 


(0. 


,207) 


Quals 


5646323600 


-> 


2228676073 


(0. 


,395) 


$ fqz_ 


comp -n2 -s6+ 


-q3 < 9827 


2#49. bin. fastq > /dev/null 


Names 


1891639400 


-> 


165489689 


(0. 


,087) 


Bases 


5646323600 


-> 


1182521177 


(0. 


,209) 


Quals 


5646323600 


-> 


1071594874 


(0. 


,190) 



Program versions 

Scramble 1.13.3 
Samtools 0.1.19 
Picard 1.105 
Cramtools 2.00-M98 
Fqz comp 4.6 



Raw benchmark output for conversion timings (ERR251692) 

We used the BAM file produced by the 1000 Genomes consortium, chosen simply because it was 
the last individual. 

ftp://ftp.1000genomes.ebi.ac.uk/voll/ftp/data/NA21 144/alignment/NA21 144.chroml 1 .ILLUMINA. 
bwa.GIH.low_coverage.201 304 15.bam 

The data set differs to the above one in that it has been processed by GATK. This means the quality 
values have higher entropy due to the recalibration process and it now has additional auxiliary tags, 
specifically a BQ:Z: string to hold the base alignment qualities. The impact is slightly reduced 
CRAM performance. 

=== 40-qual: NA21144 . chromll . ILLUMINA. bwa . GIH . lowcoverage . 20130415 === 



Scramble 


READ BAM 


Real 


12, 


,62 


CPU 


12, 


,44 


System 


0, 


,16 


Scramble 


BAM2CRAM 


Real 


71. 


.59 


CPU 


69, 


,34 


System 


2, 


,02 


Scramble 


READ CRAM 


Real 


20. 


,19 


CPU 


17, 


,78 


System 


2, 


,38 


Scramble 


BAM2BAM 


Real 


118, 


,75 


CPU 


116, 


,01 


System 


2, 


,56 


Samtools 


BAM2BAM 


Real 


116. 


.03 


CPU 


112, 


,51 


System 


3, 


,34 


Samtools 


READ BAM 


Real 


12. 


,78 


CPU 


12, 


,52 


System 


0, 


,25 


Cramtools 


READ BAM 


Real 


20. 


,35 


CPU 


20, 


.73 


System 


0, 


,37 


Cramtools 


BAM2CRAM 


Real 


262, 


.79 


CPU 


272, 


,81 


System 


6, 


,29 


Cramtools 


READ CRAM 


Real 


41, 


,02 


CPU 


42, 


,31 


System 


2, 


,02 


Picard 


READ BAM 


Real 


21, 


,73 


CPU 


22, 


,30 


System 


0, 


,46 


Picard 


BAM2BAM 


Real 


107. 


,78 


CPU 


106, 


,64 


System 


1, 


,95 


Scramble 


READ BAM 


Real 


12. 


,73 


CPU 


12, 


,56 


System 


0, 


,16 


Scramble 


BAM2SAM 


Real 


20. 


,36 


CPU 


17, 


,43 


System 


2, 


,89 


Scramble 


READ SAM 


Real 


7. 


,21 


CPU 


6, 


,47 


System 


0, 


,74 


Scramble 


SAM2SAM 


Real 


15. 


,24 


CPU 


11, 


,66 


System 


3, 


,56 


Samtools 


READ BAM 


Real 


12. 


,71 


CPU 


12, 


,58 


System 


0, 


,12 


Samtools 


BAM2SAM 


Real 


36. 


,14 


CPU 


31. 


,79 


System 


4, 


,27 


Samtools 


SAM2SAM 


Real 


40. 


,59 


CPU 


35, 


,62 


System 


4, 


,90 


Picard 


BAM2SAM 


Real 


72. 


,20 


CPU 


72, 


,02 


System 


3, 


,54 


Picard 


SAM2SAM 


Real 


81, 


,51 


CPU 


79, 


.17 


System 


5, 


,13 



=== 8-qual: NA21144 . chromll . ILLUMINA. bwa . GIH . lowcoverage . 20130415 === 



Scramble 


READ BAM 


Real 


10. 


.55 


CPU 


10, 


,46 


System 


0, 


,08 


Scramble 


BAM2CRAM 


Real 


66. 


,78 


CPU 


64, 


,35 


System 


2, 


,28 


Scramble 


READ CRAM 


Real 


19. 


,95 


CPU 


17, 


,26 


System 


2, 


,65 


Scramble 


BAM2BAM 


Real 


156. 


,38 


CPU 


154, 


,04 


System 


2, 


,10 


Samtools 


BAM2BAM 


Real 


155, 


,34 


CPU 


152, 


,22 


System 


2, 


,88 


Samtools 


READ BAM 


Real 


10, 


.89 


CPU 


10, 


,57 


System 


0, 


,23 


Cramtools 


READ BAM 


Real 


17, 


,27 


CPU 


17, 


,35 


System 


0, 


,40 


Cramtools 


BAM2CRAM 


Real 


256. 


.18 


CPU 


273, 


,42 


System 


3, 


,33 


Cramtools 


READ CRAM 


Real 


36. 


,54 


CPU 


38, 


,97 


System 


1, 


,34 


Picard 


READ BAM 


Real 


18. 


.51 


CPU 


18, 


,78 


System 


0, 


,60 


Picard 


BAM2BAM 


Real 


96. 


,75 


CPU 


95, 


,89 


System 


1, 


,52 


Scramble 


READ BAM 


Real 


10. 


,64 


CPU 


10, 


,52 


System 


0, 


,11 


Scramble 


BAM2SAM 


Real 


18, 


,14 


CPU 


15, 


,73 


System 


2, 


.38 


Scramble 


READ SAM 


Real 


7, 


,16 


CPU 


6, 


,36 


System 


0, 


,79 


Scramble 


SAM2SAM 


Real 


15. 


,08 


CPU 


11. 


,50 


System 


3, 


,09 


Samtools 


READ BAM 


Real 


10. 


,70 


CPU 


10, 


,58 


System 


0, 


,11 


Samtools 


BAM2SAM 


Real 


30. 


,64 


CPU 


26, 


,47 


System 


4, 


,11 


Samtools 


SAM2SAM 


Real 


40. 


,27 


CPU 


35, 


,16 


System 


5, 


,05 


Picard 


BAM2SAM 


Real 


70. 


,06 


CPU 


68, 


,18 


System 


4, 


,11 


Picard 


SAM2SAM 


Real 


81. 


,32 


CPU 


78, 


,04 


System 


5, 


.16 



File sizes 



Q40 

624369102 NA21144 . ch romll . ILLUMINA . bwa . GIH . lowcoverage . 20130415 . c ram 
665880162 NA21144 . chromll . ILLUMINA . bwa . GIH . low coverage . 20130415 . j ava .cram 
4504400097 NA21144 . chromll . ILLUMINA . bwa . GIH . lowcoverage . 20130415 . sam 
1050955355 NA21144 . chromll . ILLUMINA . bwa . GIH . lowcoverage . 20130415 . samtools . bam 
1050828624 NA21144 . ch romll . ILLUMINA . bwa . GIH . lowcove rage . 20130415 . sc ramble . bam 
624369138 NA21144 . ch romll . ILLUMINA . bwa . GIH . low cove rage . 20130415 . sc ramble . c ram 
1047474221 NA21144 . ch romll . ILLUMINA . bwa . GIH . low cove rage . 20130415 . pica rd . bam 

Q8 

370414424 NA21144 . ch romll . ILLUMINA . bwa . GIH . low coverage . 20130415 . bin . c ram 
402370594 NA21144 . ch romll . ILLUMINA . bwa . GIH . low cove rage . 20130415 . bin . j ava . c ram 
4504400097 NA21144 . chromll . ILLUMINA . bwa . GIH . lowcoverage . 20130415 . bin . sam 
742158141 NA21144 . chromll . ILLUMINA . bwa . GIH . lowcoverage . 20130415 . bin . samtools . bam 
742074704 NA21144 . chromll . ILLUMINA . bwa . GIH . lowcoverage . 20130415 . bin . scramble . bam 
370414468 NA21144 . chromll . ILLUMINA . bwa . GIH . low coverage . 20130415 . bin . scramble . c ram 
752296669 NA21144 . chromll . ILLUMINA . bwa . GIH . lowcoverage . 20130415 . bin . picard . bam 

The differing file sizes for BAM output mainly come from using the liblntelDeflator.so with Picard, 
which uses a (usually) lighter weight compression algorithm. 



Size breakdowns for ERR251692 (lOOOGenomes alignment) 



Q40 










$ cram 


dump NA21144. chromll. ILLUMINA. 


bwa. GIH. low coverage, 


,20130415. scramble. cram | tail 


Block 


0, 


total size 28185754 


CORE 




Block 


1, 


total size 59302 


seqs 




Block 


2, 


total size 503518165 


qual (.496) 




Block 


3, 


total size 35608647 


names 




Block 


4, 


total size 464306 


TS/NP 




Block 


5, 


total size 44512084 


tags 




Block 


6, 


total size 9878943 


soft-clip 




Q8 










$ cram 


dump NA21144. chromll. ILLUMINA. 


bwa . GIH . low_coverage 


, 20130415 . bin . scramble . cram | 


tail 










Block 


0, 


total size 28185754 


CORE 




Block 


1, 


total size 59302 


seqs 




Block 


2, 


total size 249562499 


qual (.246) 




Block 


3, 


total size 35608647 


names 




Block 


4, 


total size 464306 


TS/NP 




Block 


5, 


total size 44512084 


tags 




Block 


6, 


total size 9878943 


soft-clip 





Multi-threaded timing for converting BAM to BAM and BAM to CRAM 

Number of threads (16 to 1); Real, CPU and System times. 
Two runs per test to check reproducability on a "live" system. 

Tested on NA21 144.chroml l.ILLUMINA.bwa.GIH.low_coverage.20130415 using the 40 quality 
bin version. 



Samtools BAM -> BAM 



16 


R 


26.17 


C 


131, 


,18 


S 


5, 


.33 


16 


R 


25.99 


C 


131, 


,40 


S 


5, 


.91 


8 


R 


29.75 


C 


121, 


,98 


S 


4, 


,45 


8 


R 


30.31 


C 


124, 


.46 


S 


4, 


.34 


4 


R 


41.08 


C 


115, 


.75 


S 


3, 


.37 


4 


R 


42.49 


C 


119, 


.98 


S 


3, 


.51 


2 


R 


65.69 


C 


112, 


.95 


S 


3, 


.13 


2 


R 


67.18 


C 


115, 




c 


^ 


A~) 


1 


R 


116.21 


C 


112 


,36 


s 


3, 


.67 


1 


R 


116.82 


C 


113, 


.07 


s 


3, 


,57 


Scramble BAM 


»BAM 










16 


R 


10.23 


C 


134, 


.89 


s 


5, 


.48 


16 


R 


10.39 


C 


136, 


.86 


s 


5, 


.43 


8 


R 


17.58 


c 


129, 


,12 


s 


4, 


.46 


8 


R 


17.29 


c 


127, 


,16 


s 


4, 


.49 


4 


R 


31.26 


c 


122, 


,69 


s 


3, 


.99 


4 


R 


31.31 


c 


122, 


,67 


s 


3, 


,87 


2 


R 


61.22 


c 


121, 


,67 


s 


3, 


.47 


2 


R 


60.22 


c 


120, 


,27 


s 


3, 


,33 


1 


R 


119.74 


c 


116, 


.73 


s 


2, 


.83 


1 


R 


119.51 


c 


116, 


,27 


s 


3, 


.05 


Scramble BAM 


> CRAM 










16 


R 


8.43 


c 


95, 


,11 


s 


4, 


,21 


16 


R 


8.37 


c 


93 


,90 


s 


4, 


.52 


8 


R 


11.26 


c 


85, 


,06 


s 


3, 


.42 


8 


R 


11.32 


c 


84, 


,90 


s 


3, 


.87 


4 


R 


19.93 


c 


78, 


,98 


s 


2, 


.87 


4 


R 


19.70 


c 


77, 


.83 


s 


3, 


.14 


2 


R 


37.89 


c 


75, 


,99 


s 


2, 


,95 


2 


R 


37.80 


c 


76, 


,09 


s 


2, 


,92 


1 


R 


73.09 


c 


70, 


,49 


s 


2, 


,43 


1 


R 


71.96 


c 


69, 


,55 


s 


2, 


.23 



Multi-threading test on ERR317482 (9827 2#49.bam) 



Samtools BAM -> BAM 



1 ft R 
J.U r\ 


1 4S 


4fi 


r 

V- 


8^8 


fi7 


c 


~>A 


S4 

, J4 


1 R 

J.U r\ 


1 A~) 


1 8 

J.O 


r 


841 

OHl. 


1 3 


C 


74 




1 R 

J.U r\ 


1 41 
it J. . 


QO 

JO 


r 

V- 


840 


. ^ ^ 


C 


?s 


A~> 


8 R 


182. 


20 


C 


791. 


,51 


S 


20, 


,30 


8 R 


181. 


80 


C 


795. 


,59 


S 


19. 


.69 


8 R 


181. 


47 


c 


787. 


,58 


S 


19. 


,59 


4 R 


264. 


07 


c 


762. 


,76 


s 


18. 


,33 


4 R 


289. 


,41 


c 


785. 


,40 


s 


22. 


,28 


2 R 


470. 


,40 


c 


790. 


,09 


s 


16. 


,08 


2 R 


448. 


51 


c 


778. 


,48 


s 


19. 


,52 


1 R 


758. 


80 


c 


740. 


,58 


s 


16. 


,95 


1 R 


749. 


01 


c 


731. 


,52 


s 


16. 


,27 


Scramble BAM 


_ ► RAM 














16 R 


58. 




r 


SSO 


, J-U 


c 


zj . 


Q1 

, j J. 


16 R 


58. 


JO 


r 


OHO . 
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